Understanding Member Engagement at Wellspring:

Key Drivers of Attendance and Service Utilization

Carrie Lam, Danni Luo, Xinyue Pu, Xiaotong Shen | TUT0202-B

​​```{r setup, include=FALSE} knitr::opts_chunk$set( echo = FALSE, # Hide code include = TRUE, # Do not include output warning = FALSE, # Suppress warnings message = FALSE, # Suppress messages results = “show” # Hide printed output )

Note: the options above are to hide the code chunk in the slides and to not include the code in the slides

There are many other options you can use to customize which parts of the code are run / displayed,

as well as how the output is displayed.

You can learn more at https://quarto.org/docs/computations/execution-options.html



::: {.cell}

:::
## Introduction:
Wellspring is a Canada-wide network of charities offering free cancer support programs to individuals at any stage of their cancer journey. On March 4, 2024, Wellspring implemented a simplified registration system with the goal of increasing member engagement. This project aims to analyze how this change, along with other factors, influences member engagement in Wellspring’s programs. Specifically, we will explore three key questions:
1. How do age and program interests influence average monthly attendance?
Using multiple linear regression, we’ll analyze these relationships to help Wellspring tailor programs for different age groups and interests.
2. Does POC (Person of Color) status impact a member’s total amount of service attended?
A hypothesis test will assess whether attendance differs by POC status, helping Wellspring ensure equitable access to programs.
3. Can registration date, mailing province, and cancer risk predict attendance in the past three months?
Classification trees will be used to predict attendance, enabling Wellspring to identify members needing additional support and allocate resources effectively.
 
## Population and Audience:
Population: All potential and current members of Wellspring in Canada, including cancer patients and their families.
Audience: Wellspring staff, who are experts in cancer support services but may not have formal training in statistics. Our findings will be presented with clear, actionable insights to help them improve program delivery and member engagement.
 
## Question 1: Age, Program Interests, and Average Monthly Attendance
 
Variables Used:
- age_years: Age of the member.
- num_program_interests: Number of programs the member is interested in.
- number_of_present_service_deliveries: Number of services the member attended.
- member_start_year and member_start_month: Year and month the member joined Wellspring.
Data Wrangling:
Filtered members with valid age data (age_years >= 0).
Created a new variable, degree_of_focus, categorizing members based on their number of program interests:
- 1 program of interest
- 2-4 programs of interest
- 5-7 programs of interest
- No program of interest
Calculated average_attendance_per_month as:
Average Attendance = Number of Services Attended / (2024 - Member Start Year) * 12 + (12 - Member Start Month + 1)
note that a member joined in 2024-12 is considered as 1 month, and 2024-11 is considered as 2 months, 2023-10 is considered 15 months. This is to avoid division by 0 whilst preserving many of the events that were held during the last month (2024-12) as we have observed. ?
Visualization:
A scatter plot with a linear regression line showing the relationship between age, program interests, and average monthly attendance (Figure 1).
A table summarizing the count of members by their degree of focus.

## Plot

::: {.cell}
::: {.cell-output-display}
![](TUT0202-B-Final-Project_files/figure-revealjs/unnamed-chunk-1-1.png){width=960}
:::

::: {.cell-output .cell-output-stdout}

A tibble: 4 × 2

degree_of_focus n 1 1 program of interest 101 2 2-4 programs of interest 162 3 5-7 programs of interest 94 4 no program of interest 2492

:::
:::


## Conclusion: 1. Age, Program Interests, and Average Monthly Attendance
We found that both age and the number of program interests influence average monthly attendance. Older members and those with more program interests tend to have higher attendance rates. This suggests that Wellspring could:
Develop targeted programs for younger members to increase their engagement.
Encourage members to explore multiple programs by promoting cross-program participation.
 
## Question 2: Impact of POC Status on Engagement

::: columns
::: {.column width="50%"}
::: {style="font-size: 0.9em;"}
Variables Used:

- member_id: Unique ID of the member
- i_identify_as_poc: Whether the member identifies as a Person of Color (POC).
- attendance_status: Attendance records for each member.

Hypothesis Test:

- Conducted a randomization test to compare mean attendance between POC and non-POC members.
- Calculate the p-value to assess the significance of the observed difference.
:::
:::

::: {.column width="50%"}
::: {style="font-size: 0.9em;"}
Data Wrangling:

- Grouped attendance records by member_id to calculate total_attendance (number of services attended).
- Joined attendance data with POC status data.
- Filtered out members with missing POC status.
- Created scenarios to categorize members:
    1. Never signed up in 2023-2024.
    2. Signed up but never attended.
    3. Attended zero events.

:::
:::
:::


## Diagram 2

::: {.cell}
::: {.cell-output-display}
![](TUT0202-B-Final-Project_files/figure-revealjs/unnamed-chunk-2-1.png){width=960}
:::
:::

## Diagram 3 and Table of inactive member

::: {.cell}
::: {.cell-output-display}
![](TUT0202-B-Final-Project_files/figure-revealjs/unnamed-chunk-3-1.png){width=960}
:::

::: {.cell-output .cell-output-stdout}

A tibble: 3 × 2

scenario people 1 people never sign up in 2023-2024 105 2 people sign up only once but not showing up 5 3 all people that attend 0 event 110

:::
:::

## Test statistic and P-value

::: {.cell}
::: {.cell-output .cell-output-stdout}

A tibble: 2 × 3

i_identify_as_poc n mean 1 FALSE 296 10.6 2 TRUE 46 12.3

:::

::: {.cell-output .cell-output-stdout}

[1] 1.716069

:::

::: {.cell-output .cell-output-stdout}

[1] 0.636

:::
:::

## 2. Impact of POC Status on Engagement
Our analysis revealed no significant difference in attendance between members who identify as POC and those who do not. However, the distribution of attendance was heavily skewed toward low values for both groups, indicating that many members attend few or no events. This highlights the need for:
Further investigation into barriers to engagement, such as accessibility or program relevance.
Targeted outreach to ensure all members feel welcome and supported.

 
## Question 3: Predicting Attendance Using Registration Date, Mailing Province, and Cancer Risk
::: columns
::: {.column width="50%"}
Goals:

- Finding trends in variable significance
- Using these variables to predict attendance/active status of an individual

Applications:

- Adjust web-content or service depending of personal information variables.

:::
::: {.column width="50%"}
Variables Used:

- mailing province: Categorized as East or West Canada.
- registration date: The user's registration date
- cancer risk: Categorized as high or low risk.
- last service date before march 2024: Boolean indicating if the member attended a service in the past three months (predictor variable).
:::
:::
  


## Data Wrangling + Method
Data Wrangling:

- Created factor for variables: mailing province, registered before March 3rd, and cancer risk.
- Used last service date in past three months as the target variable for prediction.

Method:

- Bar Plot (evaluating trends in varible significance)
- P-Values (to evaluate variable significance more precisely)
- Classification Trees (prediction)


::: {.cell}

:::

::: {.cell}

:::



## Data Wrangling

Now, Let's look at the data. (Number of Unique Cancer Types:  129 Number of Unique Mailing Provinces/States:  7)

::: {.cell}

:::


::: columns

::: {.column width="50%"}

::: {.cell}
::: {.cell-output-display}
`````{=html}
<table>
<caption>Unique Types of Cancer</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Type_of_Cancer </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Breast </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Brain </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Leukemia </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Ovarian </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Other </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Colorectal </td>
  </tr>
</tbody>
</table>

::: :::

::: ::: {.column width=“50%}

Unique Mailing Provinces/States
Mailing_Province_State
British Columbia
Manitoba
Newfoundland
Nova Scotia
Ontario
Quebec

::: :::

Visualising Registration Distribution

Membership Distribution
Registered Before 03/2024 Number of Users Total Percentage
FALSE 1907 39.5
TRUE 2922 60.5

Canadian Regions Distribution

Cancer Risk Distribution
Canadian Region Number of Users Total Percentage
East 3746 87.1
West 555 12.9

Cancer Risk Distribution

Cancer Risk Distribution
Level of Risk Number of Users Total Percentage
High Risk 1377 41.1
Low Risk 1973 58.9

Visualise Variable’s Statistical Influence/Significance

P-Values
Variable P_Value
Cancer Risk 0.1484035
Region (Canada) 0.0008374
Before March 2024 0.0000000

Tree Model

Tree Accuracy

Confusion Matrix
Predicted/Actual
Active Non-Active
Active 1649 305
Non-Active 159 540
Accuracy Table
Metric Percentage
Overall Accuracy 82.5
Sensitivity 77.3
Specificity 84.4

Summary/Conclusion: Predicting Attendance Using Registration Date, Mailing Province, and Cancer Risk

  • The classification tree analysis showed that the registration system change (before or after March 2024) was the primary predictor of attendance. Cancer risk and mailing province had little impact. This suggests that:
  • The simplified registration system has a strong positive correlation with attendance, validating its effectiveness.
  • Other factors, such as cancer risk and geographic location, do not appear to significantly influence attendance.

Limitations:

1. Data Limitations

Limited Demographic Variables: The dataset lacks detailed demographic information, such as income level, education, or specific cultural backgrounds. These variables could provide deeper insights into engagement patterns and help tailor programs to diverse needs. Future Improvement: Collect additional demographic data through member surveys or registration forms. Self-Reported POC Status: The i_identify_as_poc variable relies on self-reported data, which may not capture the full complexity of racial and ethnic identities. Future Improvement: Include more granular categories for racial and ethnic identification to better understand engagement disparities. Cancer Type and Stage: The project questions do not include analysis about the type or stage of cancer, which could influence a member’s ability to attend programs. Future Improvement: Incorporate cancer type and stage data to analyze how these factors impact engagement. Geographic Granularity: The mailing_province variable categorizes members into East and West Canada, which may oversimplify geographic differences. Future Improvement: Use more specific geographic data (e.g., city or postal code) to identify regional trends in engagement.

2. Statistical Method Limitations

Linear Regression Assumptions: The multiple linear regression used in Question 1 assumes a linear relationship between age, program interests, and attendance. However, engagement patterns may be more complex. Future Improvement: Explore non-linear models or machine learning techniques to capture more nuanced relationships. Randomization Test Limitations: The randomization test for Question 2 assumes that the observed difference in attendance between POC and non-POC members is due to chance. However, unmeasured confounding variables (e.g., socioeconomic status) could influence the results. Future Improvement: Use multivariate regression or propensity score matching to control for potential confounders. Classification Tree Simplicity: The classification tree in Question 3 identified the registration system change as the primary predictor of attendance, but this may oversimplify the relationship. Other variables (e.g., program type, member preferences) could also play a role. Future Improvement: Use ensemble methods like random forests or gradient boosting to account for interactions between variables.

Overall conclusion:

Connections Between Findings The findings collectively highlight the importance of demographic factors (age, program interests) and systemic factors (registration system) in driving engagement. While the registration system change has been successful, there is still room to improve engagement among specific subgroups, such as younger members and those with fewer program interests.

Next Steps and Future Analyses Tailored Programs: Develop programs that appeal to younger members and those with limited program interests. Equity and Inclusion: Conduct surveys to better understand barriers to engagement, particularly for underrepresented groups. Long-Term Analysis: Extend the analysis to include multi-year data to assess long-term trends and the sustained impact of the registration system change.